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Abstract 

We have applied the Zipf method to extract the exponent for 
seven financial indices (DAX, FTSE; DJIA, NASDAQ, S&P500; Hang- 
Seng and Nikkei 225), after having translated the signals into a text 
based on two letters. We follow considerations based on the signal 
Hurst exponent and the notion of a time dependent Zipf law and 
exponent in order to implement two simple investment strategies for 
such indices. We show the time dependence of the returns. 



PACS numbers: 05.45. Tp, 89.65.Gh, 89.69,+x 

1 Introduction 

Uusally analysts recommend investment strategies based e.g. on "moving 
averages", "momentum indicators", and the like techniques. |l|, |2| As soon as 
econophysicists discovered scaling laws in financial data, it was of interest to 
search for some predictive value from the laws through some extrapolated 
evolution. E.g. a technique known as detrended fluctuation analysis (DFA) 
which measures the deviation of correlated fluctuations from a trend was de- 
veloped into a strategy known as the local (or better instantaneous) DFA in 
order to predict fluctuations in the exchange rates of various currencies, Gold 
price and other financial indices. [H, |6[ The statistical analysis of data was 
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based on the value of the exponent of the so found scaling law, itself related 
to the fractal dimension of the signal, or also to the Hurst exponent of the so 
called rescaled range analysis. Mathematical extensions, so called q — order 
DFA and multifractals, can be found in the literature, though optimization 
problems and predictions on the future of fluctuations are apparently not so 
evident from these methods. A drawback in the DFA is found in the fact 
that it rather looks at correlations in the sign of fluctuations rather than at 
correlations in their amplitude. 

Another sort of data analysis technique is known as the Zipf technique, [0, 
U originating in work exploring the statistical nature of languages. The 
Zipf analysis technique has also been used outside linguistic, financial and 
economic fields. The technique is based on a Zipf plot which expresses the 
relationship between the frequency of words (more generally, events) and 
the rank order of such words (or events) on a log-log diagram; a cumulative 
histogram can be drawn as well. The slope of the best linear fit on such a plot 
corresponds to an exponent s describing the frequency P of the {cumulative) 
occurrence of the words (or events) according to their rank R through, e.g. 
P{> R) ~ R- s . 

There are many instances in which financial and other economic data 
can be described through a log- log (Zipf) plot : e.g., the distribution of 
income (Pareto distribution) [RJ, the size of companies H, sociology JTD 



sometimes after translating the financial data into a text []TT] , |T2| , |13| , |14] , [15 
Thus it seems of interest to check whether such a technique can have some 
predictive value in finance. The present report is in line with such previous 
investigations. We present results based on considerations that financial data 
series are similar to fractional Brownian motion-like time series, and usually 
biased. [16] We examine whether a time dependent Zipf law and exponent 
exist and can be used in order to implement simple investment strategies. 

First, it is thus necessary to translate the financial data into a text, based 
on an alphabet with k characters and search for word s of length m. There 
are obviously k m possible words. They can be ranked according to their fre- 
quency on a log(frequency)-log(rank) diagram. A linear fit leads to consider 
the relationship as a power law. Moreover, in the spirit of the local DFA, a 
local (or "time" dependent) Zipf law or exponent can be introduced. |T(| In 



this latter reference, we have also considered the effect of a linear trend on 
the value of the Zipf exponent. 

Here below we have translated seven financial index signals (DAX, FTSE; 
DJIA, NASDAQ, S&P500; Hang-Seng and Nikkei 225) each into a text based 
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Table 1: Typical financial indices characteristics between Jan. 01, 1997 and 
Dec. 31, 2001: Hurst exponent H, (5,2)-Zipf exponent, bias (e), p u , pd, linear 



trend slope 






H 


C(5,2) 


e 


Pu 


Pd 


Trend 


1. 


DAX 


0.51±0.01 


0.11±0.05 


0.0332 


0.5332 


0.4668 


2.24±0.07 


2. 


DJIA 


0.46±0.04 


0.11±0.02 


0.0172 


0.5172 


0.4828 


2.97±0.07 


3. 


FTSE 


0.43±0.03 


0.19±0.08 


0.0149 


0.5149 


0.4851 


0.90±0.05 


4. 


Hang- Seng 


0.47±0.02 


0.08±0.02 


0.0060 


0.5060 


0.4940 


1.75±0.20 


5. 


Nasdaq 


0.56±0.03 


0.19±0.08 


0.0428 


0.5428 


0.4572 


1.19±0.06 


6. 


Nikkei225 


0.47±0.06 


0.15±0.04 


-0.0204 


0.4796 


0.5204 


-4.69±0.16 


7. 


S&P 


0.51±0.04 


0.12±0.03 


0.0152 


0.5152 


0.4848 


0.38±0.01 



on two letters u and d. Based on the above considerations we have imagined 
two simple investment strategies, and report on the results (or "returns"). 
From the beginning we stress that a restriction to two letters is equivalent to 
examine only correlations in the fluctuation signs. However the Zipf method 
main interest is surely the capability to consider amplitude fluctuations, - by 
defining various fluctuation ranges. 

2 Data analysis 

The daily closing values of (DAX, FTSE; DJIA, NASDAQ, S&P500; Hang- 
Seng and Nikkei 225) indices, from Jan. 01, 1997 till Dec. 31, 2001 (Fig.l) 
have been obtained from fittp : // f inance.yahoo.com /\ . They contain ca. 1250 
data points. After translating the financial time series into a text, one 
searches for words, and rank them according to their frequency. On a log-log 
paper, the best line fit slope is the Zipf exponent. Elsewhere we have already 
shown that the usual Zipf exponent C depends on the normalization 
process used to calculate the ranks. If the frequency / of occurrence is nor- 
malized with respect to the theoretical one /' , i.e. that expected for pure 
(stochastic) Brownian processes, one has ///' ~ . The theoretical fre- 
quency expected for a letter in a text based on a binary alphabet, u, d takes 
into account the number n of characters, say of type d ( and u), in a word. 
Suppose that in the text, the frequency of a d (u) letter is pd (p u )- Usually, 
a bias exists, i.e. p u ^ pd- Therefore /' = p^'^p^ ■ Whether or not the ( 
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and (' exponent depend on the bias has been examined elsewhere. | IE] The p u 
and pa values for the seven indices are reported in Table 1 , together with the 
bias defined here as e = p u — 0.5. The linear tendency for the time interval 
is also given in Table 1. We have calculated overall Zipf exponent values, 
C(m,k), and give the C(5,2) value for the seven indices in Table 1. 

In the spirit of the so called local (or better instantaneous) DFA method. 
We can consider that a Zipf exponent is time dependent, thus obtain a local 
Zipf law and local Zipf exponent. Only the case for m <8 letter words has 
been considered, but are not shown for lack of space. This m value is so 
chosen within the financial idea background having motivated this study, 
e.g. m = 5 is the number of days in a (bank) week ! 

In general a (one dimensional) financial index can be characterized by a 
so called Hurst exponent H, obtained as follows. The time series is divided 
into boxes of equal size, each containing a variable number of "elements". 
The local fluctuation at a point in one box is calculated as the deviation 
from the mean in that box. The cumulative departure up to the j^-point in 
the box is next calculated in all boxes. The rescaled range function is next 
calculated from the difference between the maximum and the minimum, 
i.e. the range in units of the rms deviation in the box. The average of the 
rescaled range in all boxes with an equal size n is next obtained and denoted 
by < R/S >. The above computation is then repeated for different box sizes 
s to provide a relationship between < R/ S > and s, - which is expected to 
be a power law < R/S >~ s H if some scaling range and law exist. 

If H = 1/2 one has the usual Brownian motion. The signal is said to 
be persistent for H > 1/2, and antipersistent otherwise. We have calculated 
the H urst exponent |17j by this rescaled range analysis (18| for the seven 
financial index signals. Their H value and the corresponding error bar are 
given in Table 1. The error bars are those resulting from a best linear fit and 
a root mean square analysis. 

Tests (not shown here) of the stochasticity (or not) of the data can be 
based on the surrogate data method [19[ in which one randomizes either 



the sign of the fluctuations or shuffles their amplitude, and finally observes 
whether the error bars (or confidence intervals) of the raw signal and the 
surrogate data signal overlap. 
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Figure 1: The (DAX, FTSE; DJIA, NASDAQ, S&P500; Hang-Seng and 
Nikkei 225) indices have been obtained from \http : // finance.yahoo.com/ 



They contain ca. 1250 data points, from Jan. 01, 1997 till Dec. 31, 2001 
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3 Returns and basic Zipf strategy 



The method is based on searching for the probability of a character sequence 
at the end of a word. We consider the case of what can happen the next 
day after a few (m — 1) days) only. Consider a word of length m — 1, and 
calculate in all boxes of size r the probabilities p u (t) and pd(t) to have a 
character sequence (Q_ m _3, m) and (c t _ m _3, Ct_i, cf) respectively, 
where q represents the character at time t. Since only a /c=2 alphabet 
is used, it is fair to develop a simple strategy based only on the sign of 
the fluctuations, thus use a strategy similar to that one implemented in 
the "instantaneous" DFA, i.e. when expecting correlated or anticorrelated 
fluctuations, in u and d. In order to avoid investment activity when the 
choice probability is low we have used a strength parameter for measuring 
the relative probabilities, i.e. 

K{t) = \ Pu( *\- Pd( *\ \, (1) 

V ' ' Pu(t)+Pd(t) ' V ' 

varying between and 1, its value giving the number of shares bought (or 
sold) at a certain investment time. 



Table 2: Final returns r(t) in (%) obtained after 5 years on various indices 
from various strategies Zl and Z2 as described in the text when based on a 
(m, k=2) Zipf exponent as compared (second column) to the mere final index 
value change 



T 


= 500 


r(t) 




Zipfl 






Zipf2 




m = 3 


m = 5 


m = 7 


m = 3 


m = 5 


m = 7 


1. 


DAX 


77.55 


61.33 


71.34 


62.39 


65.01 


65.42 


32.78 


2. 


DJIA 


49.49 


47.42 


55.73 


29.12 


34.52 


39.12 


23.68 


3. 


FTSE 


24.30 


33.34 


30.63 


42.98 


18.99 


29.33 


40.92 


4. 


Hang-Seng 


-15.48 


-23.88 


-26.61 


-25.13 


3.78 


-9.81 


-9.30 


5. 


Nasdaq 


46.55 


56.30 


39.87 


61.41 


56.52 


65.07 


43.85 


6. 


Nikkei225 


-50.61 


-13.78 


-12.39 


-8.73 


-5.47 


-14.60 


-7.89 


7. 


S&P 


51.16 


67.80 


80.34 


107.79 


43.65 


57.85 


71.33 



Results are reported when windows (boxes) of size r =500 respectively are 
moved along the signal. This value corresponds to a 2 year type investment 
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window. Notice that the local exponents are usually larger than the average 
one, due to finite size effects. 

In the Zipfi (Zl) strategy, we consider that if p u (t) > Pd(t), a "buy 
order" is given. A "sell order" is given for p u {t) < pdif). No order is given 
when both probabilities are equal. Results reported in Table 2 pertain to 
m= 3, 5, and 7 at the end of the 5 year interval. In the Zipf 2 (Z2) strategy 
the local linear trend is subtracted before calculating p u (t) and Pd{t). The 
time dependent returns for Zl and Z2 in the case m=3, 5, and 7, and for 
k=2 are given in Fig. 2 for the seven hereby considered financial indices. A 
return r(t) (given in %) is defined from 

Bq(t) = Bq(t ) [l + r(t)], (2) 

where Bq{t) and Bq(t ) are the amount of money available at time t and at 
the beginning t of the investment period respectively, for a share of value 
q{t) bought q(t ) at the starting date. 

4 Conclusions 

It appears (Table 2) that there is no immediate simple and general rule 
or universal optimum strategy. The latter depends on the volatility, i.e. 
the signal roughness and the local (m, A;)-Zipf exponent value. From the 
implemented simple strategies, it occurs that "the best returns" are usually 
for Zl with m = 5, except for NASDAQ for which a fine result arises from 
a Zl with m = 7, (or Z2 and m=5) and for the FTSE, with either Zl or 
Z2 and m=7. This choice of the m value and the Zl strategy is conjectured 
to be good for large (' and "non Brownian" (large H) cases. However for 
quasi-Brownian signals (and high/low (' ), then it is obvious that one has 
reduced losses for the NIKKEI when one chooses a Zl strategy with m=7; 
this is a very large (' case. This is rather similar to the FTSE case. Increased 
gains are found for SSzP with Zl and m=7, and for DJIA with Zl and m=5, 
i.e. when (' is close to 0.1. On the contrary for the HS a Z2 and m = 3 
strategy should be better, i.e. for (' << 0.1. The situation is rather neutral 
for the DAX, the choice Zl, m=5 being favored. 

Many other cases could be further considered, and theoretical work sug- 
gested : first one could wonder about signal stationarity. Next either a non 
linear (thus like a power law) trend or a periodic background could be sub- 
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Figure 2: The time dependent returns for Zl and Z2 in the case m—3, 5, 
and 7, and for k—2 for the seven considered financial indices 
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tracted from the raw signal, and the Zipf exponent time variation examined. 
Many other strategies are also available. 

In summary, we have translated seven financial index signals each into 
a text based of two letters u and d, according to the fluctuations as in a 
corresponding random walk. We have calculated the Zipf exponent (s) giv- 
ing the relationship between the frequency of occurrence of words of length 
m < 8 made of such "letters" for a binary alphabet. We have introduced 
considerations based on the notion of a local (or "time" dependent) Zipf law 
(and exponent). We have imagined two simple investment strategies taking 
into account the linear trend of the biased signal or not, and have reported 
the time dependence of the returns. 
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Figure Captions 

Figure 1 - The (DAX, FTSE; DJIA, NASDAQ, S&P500; Hang-Seng and 
Nikkei 225) indices have been obtained from \http : // f inance.yahoo.com/ . 
They contain ca. 1250 data points, from Jan. 01, 1997 till Dec. 31, 2001 

Figure 2 - The time dependent returns for Zl and Z2 in the case m=3, 
5, and 7, and for k=2 for the seven considered financial indices 
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